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' Abstract 
CN ■ 

This paper gives new results on the recovery of sparse signals using Zi-norm minimization. We introduce 
Ch ' a two-stage li algorithm equivalent to the first two iterations of the alternating h relaxation introduced in 

, ^ I [5] for an appropriate value of the Lagrange multiplier. The first step consists of the standard ii relaxation. 

The second step consists of optimizing the £i norm of a subvector whose components are indexed by the 
(N ■ pm largest components in the first stage. If p is set to i, an intuitive choice motivated by the fact that ^ 

, is an empirical breakdown point for the plain £i exact recovery probability curve, Monte Carlo simulations 

^ I ' show that the two-stage £i method outperforms the plain £i in practice. 

00 '■ 

^ 1 Introduction 

<^ 

, The Compressed sensing problem is currently the focus of an extensive research activity and can be stated as 
follows: Given a sparse vector x* G R" and an observation matrix A G M™^" with m n, try to recover 
the vector x from the small measurement vector y = Ax*. Although the problem consists of solving an 
overdetermined system of linear equations, enough sparsity will allow to succeed as shown by the following 
lemma (where Y,s will denote the set of all s-sparse vectors, i.e. vectors whose components are all zero except 
■ for at most s of them), 

o 



> 



Lemma 1.0.1 [3 If A is any m x n matrix and 2s < m, then the following properties are equivalent: 
i. The decoder Ao(?/) given by 

O; Ao(2/) = argmin^gR„||a;||o s.t. Ax ^ y. (1.0.1) 

k satisfies Ao(^a;) = x, for all x G Eg, 
^ ii. For any set of indices T with =ffT = 2k, the matrix At has rank 2s where At stands for the submatrix 

of A composed of the columns indexed by T only. 

. 5r 1-1 The li and the Reweighted li relcixations 

. The main problem with decoder Aq is that the optimization problem (jl.O.ip is in general NP-hard. For this 
reason, the now standard li relaxation strategy is adopted, i.e. the decoder Ai{y) is obtained as 

Ai(?/) = argmin^gR„||a;||i s.t. Ax ^ y. (l-l-l) 

Now, solving (jl.l.ip can be done in polynomial time and thus Ai(?/) can be efficiently computed. The second 
problem is to give robust conditions under which exact recovery holds. One such condition was given by Candes 
Romberg and Tao \1\ and is now known as the Uniform Uncertainty Principle (UUP) or as the Restricted 
Isometry Property (RIP). 

One of the main remaining challenges is to reduce the number of observations m needed to recover a given 
sparse signal x. One idea is the use of Ip, p < < 1 decoders Ap(j/). The main draw back of the approach 
using Ip, p < < 1 norm minimization is that the resulting decoding scheme is again NP-Hard. Another idea 
is to use a reweighted h approach as proposed in [S]. 

The main intuition behind this reweighted £i relaxation is the following. The greater the component Xi 
becomes, the smaller weight it should receive since it can be considered that this component should not be set 
to zero. 

The main drawback of the reweighted li approach is that an unknown parameter is to be tuned whose order 
of magnitude is hard to know ahead of time. 
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Algorithm 1 Rcweighted h algorithm (Rew-Zi) 



Input M > and L e N* 

(0) _ 

xl"-' G min:rGR",ylx=y ||a;||i 
Z = 1 

while I <N do 

Zu^ = componentwise 

end while 

Output zi^'' and a;!^-*. 



1.2 The Alternating 11 algorithm 

Another approach was proposed in ^ and uses Lagrange duahty. Let us write down problem p.O.ip . to which 
Aq is the solution map, as the following equivalent problem 

max e*z s.t. ZiXi = 0, i — 1, . . . ,n, Ax — y 

2G{oa}",xeR" 

where e denotes the vector of all ones. Here since the sum of the Zi's is maximized, the variable z plays 
the role of an indicator function for the event that Xi = 0. This problem is clearly nonconvex due to the 
quadratic equality constraints ZiXi — 0, i — 1, . . . ,n. However, these constraints can be merged into the 
unique constraint ||_D(z)a;||i — 0, leading to the following equivalent problem 

max e*z s.t. = 0. Ax^y. (1-2.1) 

ze{o,i}",xm" II w 111 - y v ; 

The Alternating li algorithm consists of a suboptimal alternating minimization procedure to approximate 
the dual function at u. The algorithm is as follows. 



Algorithm 2 Alternating li algorithm (Alt-Zi) 
Input u > and L G N» 

(0) 

Zu - e 

x^u^ £ maxa;gR,i_ Ax=y C{x, , u) 
I = 1 

while / < do 

Zu e argmax^g{Q i|„L(x„', z, u) 

(0 ^ ' r( W ^ 

Xu t argmaXj.gj{„^ Zn , uj 

end while 

Output zi^'' and x[^^ . 



Notice that, similarly to the reweighted li algorithm, the Alternating li method also requires the tuning 
of an unknown parameter u. However, the main motivation for this proposal is that this parameter u has a 
clear meaning: it is a dual variable which, in the case where the dual function 6{u) is well approximated by the 
sequence C{x^^\ z^^\ u) , can be efficiently optimized without additional prior information, due to the convexity 
of the dual function. 



2 The two stage li method 

The main remark about the alternating h method is the following (see [5]): for a given dual variable u, the 
alternating li algorithm can be seen a sequence (xu^)igN of truncated Zi-norm minimizers of the type 

a^L = argmin^gK"l|2;Tj|i s.t. Ax^y. (2.0.2) 

where T,l is the set of indices for which \x\~'^ \ < -. Therefore, the Alternating li algorithm can be seen as an 
iterative thresholding scheme with threshold value equal to i. Now assume for instance that a fraction pm of 



the non zero components is well identified by the plain li step with solution x^^\ Then, the practitioner might 
ask if the appropriate value for u is the one which imposes an h penalty on the index set corresponding to 
the n — pm smallest components of x^^K Moreover, the large scale simulation experiments which have been 
performed on the plain li relaxation seemed to agree on the fact that the breakdown point occurs near 
Thus, a practitioner could be tempted to wonder whether p = j is a. sensible value. Motivated by the previous 
practical considerations, the two stage h algorithm is defined as follows (the parameter u is now replaced by 
the parameter p = ■^)- 

Algorithm 3 Two stage li algorithm (2Stage-Zi) 
Input p e (0, i) 

Step 0: x^^^ G argmax^gjj„^ Ak=j/I|2^IIi ^ =index set of the pm largest components of x'"' 
Step 1: G argmax^gR„^ ^^^^ Hxt- ||i 
Output Xp^\ 



Notice that we restrict p to lie in (0, ^). The reason should be obvious since, due to Lemma n.O.li even 
decoder Ao(?/) is unable to identify more that ^-sparse vectors. Another remark is that the procedure could 
be continued for more than 2 steps but simulation experiments of the Alternating li method seem to confirm 
that in most cases two steps suffice to converge. 

3 Main results 

At Step 1 of the method, a subset T is selected with cardinal pm and optimization is then performed with 
objective function ||a;T^||i. In this section, we will adopt the following notations: S will denote the support of 
X* , T will denote the index set of the pm largest components of a;^°^ as defined in the two-stage li algorithm. 

will be an abbreviation of {T'^)g, the "good" subset of T'^ or, in mathematical terms, the subset of indices 
of S which also belong to T^. On the other hand, will denote the complement of in T^. 

Lemma 3.0.1 Assume the cardinal ofTg is less than^/2 and that A satisfies RIP {5,^ s). Let h^^"^ = x'^^^^x*. 
Then, there exists a positive number C* depending on x* such that ||^^^||i < C*||a;^c||i. Moreover, = 
0, then \\x^.\\i = 0. 

Proof. Let N{hT) denote the optimal value of the problem 

min I Ix^c + /iT<= 111 (3.0.3) 

subject to 

AT^{x*rr. + hr^) + At^x^ = y - AT{x*rr + hr). 

Assume that x'^c = 0. Then, N{hT) plays the role of a norm for although it does not satisfy the triangle 
inequality. In particular, N{hT) is nonnegative, convex and N{hT) — implies that Ht = 0. 

Nonnegativity and convexity are straightforward. Assume that N{hT) = 0, i.e. the solution h of p.0.3p is 
null. This implies that Aycx^i = y — Atx^c — At^xt^, which implies that xi^} — xt^ is in the kernel of Ax^j. 

Using the fact that has cardinal less that 7s/2 and the RIP{S, 7s) assumption, we conclude that x'^} — xt" ■ 
In order to finish the proof of the lemma, it remains to recall that N{hT) is convex and that, by Theorem 1.1 
in [7], 11/1*^^^111 (and thus ||/i^^''||i) is bounded from above by C \\xu^\\i in order to obtain existence of 
a sufficiently small positive constant C* depending on x* such that N{hT) > C* || /it ||i for all hx in the ball 
B(0, C||a;*||i). The desired result then follows. 

To prove that 7V(/it) = if ft-T = is a bit harder. Thus, assume that hr = 0. Then, the solution h of 
(|3.0.3p is just the solution of 

min _ II /it- II 1 hr- = y - At - x^ - At^ a; ^ . 

Now since y — Ax* , we obtain that y — At — — At^x'^P — AT<i{x'^a — x')^^) and thus, the right hand side 
term is nothing but the image of a 7s/2-sparse vector. Now, recalling that we assumed RIP{5,^s), Theorem 
1.1 in 7 implies that h must be the sparsest solution of the system At^Ht^ = y — Atx^ — AycXg^-* from which 
we deduce that h is 7s/2-sparse. Therefore the vector (/it^c, a;^]) is 7s which solves At^xt" — y ^ Atx^. On 



the other hand, x^c also solves Atcxt<^ — U ~ Atx'^ and its support is included in the support of (/it^c, ). 

Therefore, {hx^^xi^}) — x^a is a 7s/2 sparse vector which lies in the kernel of A. Using again the fact that 

RIP{S,js) holds, we conclude that {hT^,xi^}) — x^a = 0. Thus, Ht^ — and x^}) = x^o- □ 
Using this lemma, we deduce the following theorem. 

Theorem 3.0.2 Assume that RIP(5,js) holds and that an index set Tg of cardinal greater than or equal to 
(1 — 7/2)5 has been recovered at Step after thresholding, then x'^^"' satisfies 

11^(1) _^*||^ <C**||xJ. 111. 

for some constant C** depending on x* . 
Proof. The vector x^^-* satisfies 

Wxi^hi < \\x*t4i- (3.0.4) 
Let us write Using (I3.0.4|) . a now standard decomposition gives 

Ikrelli - l|/^T|||i + IIKHIi - II^T.^IIi < II^T^IIi + 

We thus obtain 

\\hTs\\i<\\hT^\\i+2\\x*r.h. (3.0.5) 
However, since RIP{S,'^s) holds, NSP{C,j/2s) holds too, with C < 1. Therefore, we obtain that 

\\hT^4l<Ci\\hTs\\l + \\hT\\l). (3.0.6) 

Combining (|3.0.5|) and (j3.0.6|) . we obtain 

IIKHIi < Y^WhrWi + Y^I^T.^IIi- (3.0.7) 

As a consequence, we obtain that 

\\hh <C{2\\x*T.h + C'\\hTh) + \\hT\\i + 2\\x*^^4^ 
+C\\hT\\i + WhrWi 
= il + C + CC')\\hT\\i + 2(1 + C)\\x*j.. 111. 

which, using Lemma 13.0.11 implies 

\\h\\i < {{1 + C + CC')C* + 2(1 + C))\\x*Tr4i. (3.0.8) 

which is the desired bound. □ 
The following corollary is a straightforward consequence of the previous theorem. 

Corollary 3.0.3 Assume that the assumptions of Theorem \3.0.2\ are satisfied. Then, exact reconstruction is 
obtained if x^a — 0, i.e. x* is s-sparse. 

4 Monte Carlo experiments 

. The following Monte Carlo experiments show that the performance of the two-stage li algorithm which drops 
the penalty over the index set of the to/4 largest components of the solution of plain h are almost as good as 
the performance of the reweighted h with the best parameter which is usually unknown in practice. A Python 
program is available at http://stephane.g. chretien.googlepages.com/alternatingll and can be used to perform 
these experiments and other involving the Alternating li algorithm. 
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